Skip to content

Conversation

@davidkyle
Copy link
Member

@davidkyle davidkyle commented Sep 17, 2024

#111684 improved the model install time by using multiple streams and threads to download and write the model parts. The change was reverted in #112961 after it was discovered the be the cause of Out Of Memory exceptions.

The design relied on using a fixed size thread pool to limit the concurrent downloads and hence also manage memory usage. However, the indexing of the downloaded document was performed async which meant a new download request would be forked and executed while the write request was still in flight leading to large numbers of in flight requests. The fix here is to block on the index write.

The first commit is the revert of the revert, the later commits introduce the blocking write and reuse a byte buffer that was being recreated for every downloaded part. Allocating that byte buffer is now protected by a circuit breaker.

Labelled as a non issue because the code that caused the OOM was reverted before it made it to a production environment.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Member

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the newer changes. Although, ideally there should be a test that shows the number of parallel downloads is limited. In any case, I'm gonna leave the final LGTM to ML, if that's ok.

true
);

client.execute(PutTrainedModelDefinitionPartAction.INSTANCE, modelPartRequest).actionGet();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the part that is blocking and/or slowing down the indexing, correct? We'll wait for the client response rather than continue asynchronously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gain is from using 5 threads to stream the download and index the parts. The non-blocking write meant that we had more than 5 in flight requests (the download is faster than the indexing) and that was causing the OOM. In order to limit the number of requests to at most 5 there has to be some element of blocking. Model download uses it's a dedicated thread pool so the block does not starve other parts of the code of resources

Comment on lines +128 to +134
for (int i = 0; i < ranges.size() - 1; i++) {
assertThat(ranges.get(i).rangeStart(), is(startBytes));
long end = startBytes + ((long) ranges.get(i).numParts() * chunkSize) - 1;
assertThat(ranges.get(i).rangeEnd(), is(end));
long expectedNumBytesInRange = (long) chunkSize * ranges.get(i).numParts() - 1;
assertThat(ranges.get(i).rangeEnd() - ranges.get(i).rangeStart(), is(expectedNumBytesInRange));
assertThat(ranges.get(i).startPart(), is(startPartIndex));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@davidkyle
Copy link
Member Author

@pxsalehi I pushed a small change to add a circuit breaker check before allocating the byte butter. Please can you take a look at 78905e5. Thanks

@davidkyle
Copy link
Member Author

@elasticmachine update branch

Copy link
Member

@pxsalehi pxsalehi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pxsalehi I pushed a small change to add a circuit breaker check before allocating the byte butter. Please can you take a look at 78905e5. Thanks

LGTM. Thanks.

@davidkyle davidkyle merged commit 7a0f4ee into main Sep 25, 2024
16 checks passed
@davidkyle davidkyle deleted the limit-in-flight-requests branch September 25, 2024 09:10
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 25, 2024
…stic#112992)

Restores the changes from elastic#111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to 
store the model definition.
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x
8.15 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 112992

elasticsearchmachine pushed a commit that referenced this pull request Sep 25, 2024
…2992) (#113514)

Restores the changes from #111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to 
store the model definition.
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Sep 27, 2024
…stic#112992)

Restores the changes from elastic#111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to
store the model definition.
# Conflicts:
#	x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java
#	x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
elasticsearchmachine pushed a commit that referenced this pull request Sep 27, 2024
…2992) (#113710)

Restores the changes from #111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to
store the model definition.
# Conflicts:
#	x-pack/plugin/ml-package-loader/src/main/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackage.java
#	x-pack/plugin/ml-package-loader/src/test/java/org/elasticsearch/xpack/ml/packageloader/action/TransportLoadTrainedModelPackageTests.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants